Triton 编程入门：多维张量的线性现实

虽然我们为了数学上的便利将数据可视化为二维网格，但硬件仅看到一个 连续的 1 维字节流。理解这种“线性现实”是实现逐行归约模式——例如寻找最大值或指数之和。

每个多维张量在物理上都是顺序存储的。要实现 $\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}$，我们必须识别代表某一行的线性段，并进行遍历以计算最大值和总和。

为什么 Softmax 需要稳定化？输入值过高会导致 $e^{x}$ 指数爆炸。我们通过以下方式实现稳定：$$\text{exp}(x_i - \text{max}(x))$$ 这要求内核设计者在最终归一化前执行两次线性归约（先求最大值，再求和）。

在开发 Triton 内核时，我们使用 仅测试短行 （例如宽度为 16）来确保我们的线性指针算术能正确捕获每一个元素，然后再扩展到生产负载。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

How are 2D tensors physically arranged in GPU memory?

As nested hardware folders.

As a contiguous 1D stream of bytes.

In a hexagonal lattice.

As independent scalar registers.

QUESTION 2

What is the primary reason for performing a row-wise max reduction before exponentiation?

To sort the data for faster access.

To ensure numerical stability and prevent overflow.

To reduce the memory footprint of the tensor.

To align the data with 32-byte boundaries.

QUESTION 3

In the context of the Linear Reality, what is a reduction pattern?

The process of deleting unused rows.

Compressing the tensor using ZIP algorithms.

Aggregating multiple values into a single statistic (e.g., sum, max).

Reducing the clock speed of the GPU.

QUESTION 4

Why is testing performed on 'short rows' first?

Short rows consume more power.

To verify indexing logic without complex tiling overhead.

Short rows are stored in L1 cache only.

Triton cannot handle rows longer than 1024.

QUESTION 5

Which formula represents the stable version of Softmax?

$$e^{x_i} / \sum e^{x_j}$$

$$\text{max}(x) / \text{sum}(x)$$

$$\frac{e^{x_i - \max(x)}}{\sum e^{x_j - \max(x)}}$$

$$x_i - \text{avg}(x)$$